Conversation
Page-table pool and EL1 shim previously sat at fixed low addresses
[0x10000, 0x400000), colliding with low-linked ET_EXECs. Android
linker64 binaries link at 0x200000 and the loader accepted them, but
sys_{mprotect,munmap,mmap} MAP_FIXED, and rt_sigreturn then rejected any
operation on the overlapping pages with a bare EINVAL as soon as the
binary tried to RELRO its data segment.
Relocate the page-table pool, shim code, and shim data into a 4 MiB
reserve placed just below g->interp_base, in the dead zone between
g->mmap_limit and g->interp_base. PT_POOL_BASE, SHIM_BASE, and
SHIM_DATA_BASE become runtime guest_t fields computed by compute_infra_layout
from guest_size; for 36-bit IPA the reserve sits at [60 GiB - 4 MiB,
60 GiB), for 40-bit IPA at [1020 GiB - 4 MiB, 1020 GiB). Two helpers
guest_range_hits_infra and guest_addr_in_infra retarget the four infra
guards at the new range without weakening their security intent. The 64
KiB null-guard slot at the bottom of the reserve is covered too so guest
mmap state cannot semantically reserve it either.
Bump fork IPC to v9 to carry elf_load_min so nested forks from low-linked
ET_EXECs see the actual load address rather than the legacy ELF_DEFAULT_BASE
constant. Validate hdr.ipa_bits, hdr.guest_size, and the page-aligned
in-pool location of hdr.pt_pool_next and hdr.ttbr0 in the child path
before any size-derived arithmetic so a malformed header cannot underflow
interp_base or misalign the page-table walker.
Plumb guest_t through thread_alloc_sp_el1 and record the slot index in
thread_entry_t so thread_free_sp_el1_locked can clear the bitmap from
teardown contexts (thread_{deactivate,destroy_all_vcpus,ptrace_wait) that
lack a guest_t reference.
Add tests/test-fork-lowbase.c, a static ET_EXEC linked at 0x200000 that
exercises a nested fork. The grandchild only completes when intermediate
child preserved elf_load_min across the IPC handoff.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Page-table pool and EL1 shim previously sat at fixed low addresses [0x10000, 0x400000), colliding with low-linked ET_EXECs. Android linker64 binaries link at 0x200000 and the loader accepted them, but sys_{mprotect,munmap,mmap} MAP_FIXED, and rt_sigreturn then rejected any operation on the overlapping pages with a bare EINVAL as soon as the binary tried to RELRO its data segment.
Relocate the page-table pool, shim code, and shim data into a 4 MiB reserve placed just below g->interp_base, in the dead zone between g->mmap_limit and g->interp_base. PT_POOL_BASE, SHIM_BASE, and SHIM_DATA_BASE become runtime guest_t fields computed by compute_infra_layout from guest_size; for 36-bit IPA the reserve sits at [60 GiB - 4 MiB, 60 GiB), for 40-bit IPA at [1020 GiB - 4 MiB, 1020 GiB). Two helpers guest_range_hits_infra and guest_addr_in_infra retarget the four infra guards at the new range without weakening their security intent. The 64 KiB null-guard slot at the bottom of the reserve is covered too so guest mmap state cannot semantically reserve it either.
Bump fork IPC to v9 to carry elf_load_min so nested forks from low-linked ET_EXECs see the actual load address rather than the legacy ELF_DEFAULT_BASE constant. Validate hdr.ipa_bits, hdr.guest_size, and the page-aligned in-pool location of hdr.pt_pool_next and hdr.ttbr0 in the child path before any size-derived arithmetic so a malformed header cannot underflow interp_base or misalign the page-table walker.
Plumb guest_t through thread_alloc_sp_el1 and record the slot index in thread_entry_t so thread_free_sp_el1_locked can clear the bitmap from teardown contexts (thread_{deactivate,destroy_all_vcpus,ptrace_wait) that lack a guest_t reference.
Add tests/test-fork-lowbase.c, a static ET_EXEC linked at 0x200000 that exercises a nested fork. The grandchild only completes when intermediate child preserved elf_load_min across the IPC handoff.
Summary by cubic
Moves the page‑table pool and EL1 shim out of low user VA to avoid collisions with low‑linked
ET_EXECbinaries. Places them in a 4 MiB reserve just belowinterp_baseand preserves the true ELF load base across fork (IPC v9) to fix nested forks.Bug Fixes
pt_pool_base,shim_base, andshim_data_baseare now per‑guest.guest_range_hits_infraandguest_addr_in_infrato blockmmap(MAP_FIXED),munmap,mprotect, andrt_sigreturnfrom touching infra memory.elf_load_min; child validatesipa_bits,guest_size, and page‑aligned in‑poolpt_pool_next/ttbr0. Addstest-fork-lowbase(non‑PIE at 0x200000) to verify nested fork behavior.Refactors
elf_load_minand use it when snapshotting ELF+brk for fork.thread_alloc_sp_el1(g, t)now takesguest_tand records the slot index so teardown can free without aguest_t.g->shim_base/g->shim_data_base; icache invalidation and used‑region reporting updated accordingly.Written for commit 40a759e. Summary will update on new commits.